Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API update (data & processing) #25

Merged
merged 15 commits into from
Feb 24, 2024
Merged

API update (data & processing) #25

merged 15 commits into from
Feb 24, 2024

Conversation

percevalw
Copy link
Member

Changelog

Added

  • New unified edspdf.data api (pdf files, pandas, parquet) and LazyCollection object
    to efficiently read / write data from / to different formats & sources. This API is
    has been heavily inspired by the edsnlp.data API.
  • New unified processing API to select the execution backend via data.set_processing(...)
    to replace the old accelerators API (which is now deprecated, but still available).
  • eds.huggingface-embedding now supports quantization and other AutoModel.from_pretrained kwargs

Fixed

  • eds.huggingface-embedding now resize bbox features for large PDFs, instead of making the model crash

@percevalw percevalw linked an issue Feb 9, 2024 that may be closed by this pull request
@percevalw percevalw force-pushed the api-update branch 4 times, most recently from f098dc0 to 7bacbfc Compare February 9, 2024 02:18
Copy link

codecov bot commented Feb 9, 2024

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (06c527b) 98.39% compared to head (85b02eb) 98.60%.
Report is 1 commits behind head on main.

Files Patch % Lines
edspdf/processing/multiprocessing.py 98.83% 4 Missing ⚠️
edspdf/data/files.py 97.97% 2 Missing ⚠️
edspdf/data/parquet.py 98.03% 2 Missing ⚠️
edspdf/processing/simple.py 96.49% 2 Missing ⚠️
edspdf/trainable_pipe.py 98.24% 2 Missing ⚠️
edspdf/data/pandas.py 97.77% 1 Missing ⚠️
edspdf/pipeline.py 99.10% 1 Missing ⚠️
edspdf/utils/lazy_module.py 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #25      +/-   ##
==========================================
+ Coverage   98.39%   98.60%   +0.20%     
==========================================
  Files          36       46      +10     
  Lines        2370     3012     +642     
==========================================
+ Hits         2332     2970     +638     
- Misses         38       42       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@percevalw percevalw force-pushed the api-update branch 8 times, most recently from 04780b1 to 72b6688 Compare February 16, 2024 12:58
@percevalw percevalw merged commit 1c76a5a into main Feb 24, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: deprecate accelerators and follow edsnlp.data-like API
1 participant